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1)  INTRODUCTION 

Consider  a  network  o-f  N  processors  (nodes)  with  communication 
lines  (edges)  between  any  two  of  them,  so  that  the  underlying 
graph  is  complete.  The  nodes  have  distinct  identities  that  are 
ordered  and  they  locally  number  their  adiacent  edges  from  1  to 
N-l,  but  they  initially  ignore  the  identities  of  the  edge 
nat i ons.  The  nodes  are  initially  asleep.  An  arbitrary  non 

empty  subset  wake  up  spontaneously  and  start  executing  a  common 
asynchronous  distributed  algorithm  that  consists  of  sending  and 
racmiving  messages  over  edges  and  processing  them.  Messages 
arrive  without  error,  after  an  arbitrary  but  finite  delay. 


We  -^re  interested  in  desiqning  an  algorithm  tor  all  nodes  to 

ao"36  on  a  common  leader,  usinq  as  tew  messages  as  possible. 
This  is  equivalent  (up  to  0(N)  messaqes)  to  the  problems  ot 
finding  the  node  with  largest  identity,  or  of  making  sure  that 
all  nodes  awake,  or  of  finding  a  spanning  tree.  Tt  is  clear  that 
0  (N**2>  messages  are  enough.  However  Ell  have  shown  that 
O(NloqN)  messages  are  both  necessary  and  sufficient  in  the  worst 
case. 

Similar  problems  arise  in  networks  that  are  not  complete.  For 
dense  networks  a  high  cost  is  paid  bv  the  unability  to  guarantee 
that  all  nodes  will  be  reached  without  exploring  most  edges.  To 
that  effect  0(E)  messages  are  required  in  the  worst  case,  where  E 
is  the  number  of  edges.  [23  gives  a  simple  algorithm  that 
requires  O(NloqN)  messaqes  to  find  the  leader;  this  is  an  average 
over  a  class  of  random  graphs  and  it  assumes  that  N  is  initially 
known.  [33  offers  a  method  with  at  most  O(NlogN)  +  0(E)  messaqes 
to  find  the  leader.  The  same  cost  is  incurred  in  [4  3  to  solve 
the  more  complicated  problem  of  finding  a  minimum  spanninq  tree. 

The  special  case  of  a  ring  network  has  also  been  studied.  Worst 
case  communication  costs  of  O(NloqN)  can  be  achieved  readily 
C43, ESI,  [63, T73  and  are  also  necessary  [S3. 


This  paper  offers  an  extremely  simple  algorithm  to  find  a  leader 
in  a  complete  network  using  O(NlogN)  messaqes,  each  containing  at 
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most  lociW  +■  i  bits,  where  i  is  the  length  of  the  representation 
of  a  node  identity.  The  algorithm  clearly  illustrates  the 

essential  feature  necessary  to  get  a  small  number  of  messages, 
namely  to  qiye  priority  to  nodes  that  have  already  done  much 
work.  The  next  section  presents  the  algorithm.  Its 
communication  cost  is  analyzed  in  the  third  section. 


2)  DESCRIPTION  OF  THE  ALGORITHM 

The  algorithm  works  by  havinq  nodes  attempt  to  capture  other 
nodes,  enlarging  their  domains.  The  node  which  capture  all  other 
nodes  becomes  the  leader.  A  captured  node  keeps  its  domain, 
without  tryinq  to  augment  it.  A  node  can  be  part  of  many  domains 
but  remembers  which  of  its  edqes  leads  to  its  "master",  i.e.  the 
last  node  by  which  it  has  been  captured. 

When  node  A  sends  a  TEST  message  to  node  B  to  attempt  to  capture 
it,  B  forwards  the  message  to  its  master  C  which  may  possibly  be 
B  itself.  If  the  size  of  A’s  domain  is  larger  than  that  of  C  (or 
they  are  equal  but  the  identity  of  A  is  larger)  then  C  stops  its 
capture  process  (WITHOUT  becoming  part  of  A’s  domain)  and  sends 
the  message  WINNER (A)  to  B.  B  then  becomes  part  of  A’s  domain 
and  forwards  WINNER(A)  to  A.  A  continues  the  capture  process. 
To  insure  that  C  does  not  receive  messages  from  B  after  havinq 
lost  the  fight,  B  is  restricted  to  forward  only  one  message  at  a 
time  to  its  master.  Other  messages  that  may  arrive  while  the 


issue  of  a  fight  is  uncertain  are  queued  at  8.  They  are 
forwarded  to  B's  master  when  the  result  of  the  previous  fight  is 
known . 

If  C  wins  it  sends  WINNER (C)  to  8.  A  does  not  receive  any  reply 
and  so  is  inhibited  from  increasing  its  domain. 

We  now  proceed  with  a  more  formal  description  of  the  algorithm. 
Each  node  maintains  four  variables,  one  array  and  two  message 
sets: 

STATE:  its  state,  with  values  "active"  or  "stopped". 

SIZE:  the  number  of  nodes  it  has  captured 

MASTER:  the  identity  of  its  current  master 

PENDING:  the  number  of  messages  to  forward  to  the  master 

EDGE_TO  ( :i  d)  :  the  number  of  the  edge  leading  to  id  (if  known). 

INPUT_SET:  set  in  which  arriving  messages  are  placed 

PENDING_SET:  set  of  messages  waiting  to  be  forwarded  to  the 

master 

Initially  all  sets  are  empty,  the  STATEs  are  "active",  the  SIZEs 
are  0,  the  MASTERS  are  set  to  ID  (the  identity  of  the  local  node) 
the  PENDINGs  are  0  and  the  EDGE_TOOs  are  undefined  except  that 
EDGE_T0(ID)  is  set  to  — 1.  Edge  -1  is  an  artificial  "self  loop" 
that  we  introduce  to  simplify  the  description  of  the  algorithm. 

The  algorithm  starts  when  a  high  level  protocol  awakes  one  or 
many  nodes  by  placing  the  message  WINNER (ID)  in  their  INPUT_SETs. 


A  node  waits  until  a  message  is  placed  in  its  INPUT_SET.  It  then 
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processes  the  message  completely  and  either  it  becomes  the  ' eider 
or  it  waits  tor  more  messages * 


Node  ID  receiving  the  messaqe  WINNER < id)  does  as  tallows: 


It  ID  =  id  then  \*  start  or  response  to  a  TEST  that  originated  here  *\ 


•C  SIZE  =  SIZE  +  1? 

It  SIZE  =  N  then  STOP; 

It  STATE  •■=  "active  "  then  \* 
send  TEST (SIZE, ID)  on 


\*  node  ID  is  the  leader 
no  tight  has  been  lest,  continue 

edge  SIZE  > 


*\ 


else  i  It  MASTER  <>  id  then 

■C  MASTER  =  id;  \*  record  and  notit y  new  master  *\ 

send  WINNER (id)  on  EDGE _T0 (id)  > 

PENDING  =  PENDING  —  1$  \*  torward  waiting  TEST  messages  *\ 

it  (PENDING  >  0)  then 

send  a  message  trom  PENDING  .SET  on  EDGE. TO 'MASTER)  J 


Node  ID  receiving  the  message  TEST (size, id)  on  edge  e  does  as  follows: 


If  e  <  SIZE  then 

i  If  (size, id)  >  (SIZE, ID) 

<  STATE  =  "stopped"; 
send  WINNER (id)  on 
else  send  WINNER (ID)  on 


el  se 


\*  message  comes 


i  EDGE_T0(id>  *  e; 


\*  message  comes  from  node  in  domain 
then  \*  1  e>:  i  coqraphi  cal  ordering 

edge  e  > 

edge  e  '■ 

from  outside  of  domain,  tell  master 


*\ 

*\ 


PENDING  *  PENDING  ♦  1; 


It  ( PENDING 


1)  than  send  TEST > si 2 e , i d )  on  EDGE  TO 'MASTER) 


else  put  TEST (size, id'  n  PENDING  SE”  > 


3)  CORRECTNESS  AND  COMPLEXITY  ANALYSIS 

The  algorithm  must  terminate,  in  the  sense  that  all  nodes  either 
stop  or  wait  -for  a  message  but  no  messaqe  is  in  transit,  because 
no  node  can  generate  more  than  N  -  1  TEST  messages,  each  TEST 
message  can  cause  at  most  three  other  messages  to  be  sent,  and 
messaqe  propagation  times  are  -finite. 

De-fine  the  "domain"  o-f  node  ID  at  time  t  as  the  set  o-f  nodes  from 
which  it  has  received  WINNER  (ID),  including  if self  if  it  has  been 
awaken  by  the  hiqher  level  protocol.  Note  that  the  cardinality 
of  a  domain  does  not  decrease  with  time  and  that  as  time 
increases  each  node  belongs  to  more  and  more  domains.  After  a 
node  becomes  part  of  a  new  domain  the  size  of  the  old  one  cannot 
increase  by  more  than  one  (as  the  master  has  been  defeated  and 
will  not  issue  new  TEST  messages  but  may  still  receive  an  answer 
to  an  outstanding  TEST)  while  the  size  of  the  new  one  becomes  at 
least  as  large  as  what  the  size  of  the  old  one  will  ever  be. 

The  following  fact  is  critical  to  the  analysis  of  the  algorithm: 
If  at  times  Tl,  T2,  . . . Tk  respectively  domains  Dl,  D2  . . . Dk  have 
the  same  size  s  then  they  are  almost  disjoint,  in  the  sense  that 
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st  most  k  -  1  nodes  beionq  to  two  of  them,  and  no  node  belonas  to 
more  than  two  of  them. 


i 

i 

.  • 

» 
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Here  is  the  outline  of  a  proof:  If  a  domain  Di  has  won  a  node 
that  already  belonged  to  another  domain  Dj  the  size  of  Dj  was  not 
more  than  that  of  Di  and  the  size  of  Di  has  increased  to  at  least 
what  the  size  of  Dj  will  ever  be,  i.e.  at  least  s.  We  conclude 
that  if  a  node  is  common  to  two  domains  of  size  s,  a  battle  must 
have  taken  place  when  both  domains  had  size  s  -  1  and  thus  its 
issue  was  decided  on  the  basis  of  node  identities.  The  domain  Di 
with  smallest  identity  cannot  win  such  a  battle  and  thus  at  most 
k  -  1  of  the  Di  ’  s  may  each  absorb  at  most  one  node  from  another 
one. 


a 


Rank  the  nodes  in  order  of  decreasing  order  of  SIZE  at 
termination,  breaking  ties  arbitrarily  and  denote  by  Sk  the  final 
SIZE  of  the  kth  ranked  node.  The  node  with  SIZE  SI  and  largest 
ID  cannot  have  lost  a  fight  thus  must  have  become  the  leader.  Sk 
is  not  greater  than  <N  ♦  k  -  l)/k  because  at  most  k  -  1  nodes 
belonged  to  two  of  the  domains  of  the  nodes  ranked  from  1  to  k  at 
the  times  they  reached  size  Sk,  thus  there  cannot  be  two  leaders. 

A  total  of  at  most  4  messages  can  be  transmitted  for  each  TEST 
message  generated,  and  a  node  generates  no  more  TESTs  than  its 
final  SIZE.  The  final  SIZE  of  the  nodes  that  were  never  awaken 
by  the  high  level  protocols  is  zero.  For  the  other  nodes  we  can 
use  the  bound  on  Sk  derived  above  and  we  conclude  that  the  total 
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number  of  messages  is  no  more  than  4  *  •;  (jg  -  t>  *  f  1.  +  1/2  -  -  /t 
+  ...  +  1/K)  +  K)  =  0(N  log  K) ,  where  K  is  the  number  of  awaken 
nodes.  This  number  can  be  tightened  by  noticing  that  the  first 
TEST  message  received  by  a  node  generates  at  most  one  other 
messaf3e  on  a  true  edge  and  also  that  the  algorithm  could  stop  as 
soon  as  SIZE  is  greater  than  (N  +  1)/  2. 
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